<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Regular expressions in Ruby</title>
<link rel="stylesheet" href="/cfg/format.css" type="text/css">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta name="keywords" content="Ruby, tutorial, programming language, learn Ruby, Linux, regular expressions">
<meta name="description" content="In this part of the Ruby tutorial, we cover regular expressions.">
<meta name="language" content="en">
<meta name="author" content="Jan Bodnar">
<meta name="distribution" content="global">

<script type="text/javascript" src="/lib/jquery.js"></script>
<script type="text/javascript" src="/lib/common.js"></script>

</head>

<body>

<div class="container2">

<div id="wide_ad" class="ltow">
<script type="text/javascript"><!--
google_ad_client = "pub-9706709751191532";
/* 160x600, August 2011 */
google_ad_slot = "2484182563";
google_ad_width = 160;
google_ad_height = 600;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</div>


<div class="content2">

<a href="/" title="Home">Home</a>&nbsp;
<a href="..">Contents</a>



<h1>Regular expressions in Ruby</h1>

<p>
In this part of the Ruby tutorial, we will talk about Regular expressions in Ruby.
</p>

<p>
Regular expressions are used for text searching and more advanced text manipulation. 
Regular expressions are built into tools like grep, sed, text editors like vi, emacs,
programming languages like Tcl, Perl, Python. Ruby has a built-in support 
for regular expressions too.
</p>

<div class="big_hor">
<script type="text/javascript"><!--
google_ad_client = "ca-pub-9706709751191532";
/* big_horizontal */
google_ad_slot = "2904953388";
google_ad_width = 728;
google_ad_height = 90;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</div>

<p>
From another point of view, regular expression is a domain specific language
for matching text. 
</p>

<p>
A <b>pattern</b> is a regular expression, that defines the text, we are searching for
or manipulating. It consists of text literals and metacharacters. The pattern is
placed inside two delimiters. In Ruby these are // characters. They inform
the regex function where the pattern starts and ends. 
</p>

<p>
Here is a partial list of metacharacters:
</p>

<table>
<tbody>
<tr><td>.</td><td>Matches any single character.</td></tr>
<tr><td>*</td><td>Matches the preceding element zero or more times.</td></tr>
<tr><td>[  ]</td><td>Bracket expression. Matches a character within the brackets.</td></tr>
<tr><td>[^  ]</td><td>Matches a single character, that is not contained within the brackets.</td></tr>
<tr><td>^</td><td>Matches the starting position within the string.</td></tr>
<tr><td>$</td><td>Matches the ending position within the string.</td></tr>
<tr><td>|</td><td>Alternation operator.</td></tr>
</tbody>
</table>


<p>
In Ruby language, <code>Regexp</code> class is used to develop regular expressions. 
There are also two shorthand ways to create regular expressions. The following
example will show them. 
</p>

<pre class="code">
#!/usr/bin/ruby

re = Regexp.new 'Jane'
p "Jane is hot".match re

p "Jane is hot" =~ /Jane/
p "Jane is hot".match %r{Jane}
</pre> 
 
<p> 
In the first example, we show three ways of applying regular
expressions on a string. 
</p> 

<pre class="explanation">
re = Regexp.new 'Jane'
p "Jane is hot".match re
</pre>

<p>
In the above two lines, we create a <code>Regexp</code> object cointaining
a simple regular expression text. Using the <code>match</code> method, we
apply this regular expression object on the "Jane is hot" sentence. We check,
if the word 'Jane' is inside the sentence. 
</p>

<pre class="explanation">
p "Jane is hot" =~ /Jane/
p "Jane is hot".match %r{Jane}
</pre>

<p>
These two lines do the same. Two forward slashes // and the %r{}
characters are shorthands for the more verbose first way. 
In this tutorial, we will use the forward slashes. This is a de facto
standard in many languages. 
</p>

<pre>
$ ./regex.rb
#&lt;MatchData "Jane"&gt;
0
#&lt;MatchData "Jane"&gt;
</pre>

<p>
In all three cases there is a match. The <code>match</code> method returns
a matched data, or <code>nil</code> if there is no match. The =~ operator
returns the first character of the matched text, or <code>nil</code> otherwise.
</p>


<h2>The dot character</h2>

<p> 
The dot character is a regular expression character, which matches any single 
character. Note that there must be some character; it may not be omitted.
</p> 

<pre class="code">
#!/usr/bin/ruby

p "Seven".match /.even/
p "even".match /.even/
p "eleven".match /.even/
p "proven".match /.even/
</pre> 
 
<p> 
In the first example, we will use the <code>match</code> method
to apply regular expression on strings. The <code>match</code> method
returns the matched data on success or <code>nil</code> otherwise.
</p> 

<pre class="explanation">
p "Seven".match /.even/
</pre>

<p>
The "Seven" is the string on which we call the <code>match</code> method.
The parameter of the method is the pattern. The /.even/ regular
pattern looks for a text that starts with an arbitrary character followed
by the 'even' characters. 
</p>

<pre>
$ ./dot.rb
#&lt;MatchData "Seven"&gt;
nil
#&lt;MatchData "leven"&gt;
nil
</pre>

<p>
From the output we can see which strings did match and which did not. 
</p>

<hr class="btm">

<p>
As we have said above, if there is a dot character, there must be
an arbitrary character. It may not be omitted. What if we wanted to
search for a text, in which the character might be omitted? In other
words, we want a pattern for both 'seven' and 'even'. For this, we
can use a ? repetition character. The ? repetition character tells 
that the previous character may be present 0 or 1 time. 
</p>

<pre class="code">
#!/usr/bin/ruby

p "seven".match /.even/
p "even".match /.even/
p "even".match /.?even/
</pre> 
 
<p> 
The script uses the ? repetition character. 
</p> 

<pre class="explanation">
p "even".match /.even/
</pre>

<p>
This line prints <code>nil</code> since the regular expression
expects one character before the 'even' string. 
</p>

<pre class="explanation">
p "even".match /.?even/
</pre>

<p>
Here we have slightly modified the regular expression. The '.?' stands
for no character or one arbitrary character. This time there is a match.
</p>

<pre>
$ ./dot2.rb
#&lt;MatchData "seven"&gt;
nil
#&lt;MatchData "even"&gt;
</pre>

<p>
Example output.
</p>


<h2>Regular expression methods</h2>

<p>
In the previous two examples, we have used the <code>match</code> method
to work with regular expressions. There are other methods where we can 
apply regular expressions. 
</p>

<pre class="code">
#!/usr/bin/ruby

puts "motherboard" =~ /board/
puts "12, 911, 12, 111"[/\d{3}/]

puts "motherboard".gsub /board/, "land"

p "meet big deep nil need".scan /.[e][e]./
p "This is Sparta!".split(/\s/)
</pre>

<p>
The example shows some methods that can work with regular expressions. 
</p>

<pre class="explanation">
puts "motherboard" =~ /board/
</pre>

<p>
The =~ is an operator that applies the regular expression on the right
to the string on the left. 
</p>

<pre class="explanation">
puts "12, 911, 12, 111"[/\d{3}/]
</pre>

<p>
Regular expressions can be placed between the square brackets following
the string. This line prints the first string which has three digits. 
</p>

<pre class="explanation">
puts "motherboard".gsub /board/, "land"
</pre>

<p>
With the <code>gsub</code> method we replace a 'board' string with
a 'land' string. 
</p>

<pre class="explanation">
p "meet big deep nil need".scan /.[e][e]./
</pre>

<p>
The <code>scan</code> method looks for matches in the string. 
It looks for all occurences, not just the first one. The line prints
all strings that match the pattern. 
</p>


<pre class="explanation">
p "This is Sparta!".split(/\s/)
</pre>

<p>
The <code>split</code> method splits a string using a given regular 
expression as a separator. The \s character type stands for any 
whitespace character. 
</p>

<pre>
$ ./apply.rb
6
911
motherland
["meet", "deep", "need"]
["This", "is", "Sparta!"]
</pre>

<p>
Output of the apply.rb script. 
</p>


<h2>Special variables</h2>

<p>
Some of the methods that work with regular expressions activate
a few special variables. They contain last matched string, string
before the last match and string after the last match. These
variables make the job easier for a programmer. 
</p>

<pre class="code">
#!/usr/bin/ruby

puts "Her name is Jane" =~ /name/

p $`
p $&amp;
p $'
</pre>

<p>
The example shows three special variables. 
</p>

<pre class="explanation">
puts "Her name is Jane" =~ /name/
</pre>

<p>
In this line we have a simple regular expression matching. 
We look for a 'name' string inside the 'Her name is Jane' 
sentence. We use the =~ operator. This operator also sets 
three special variables. The line returns number 4, which is
the position on which the match starts. 
</p>

<pre class="explanation">
p $`
</pre>

<p>
The $` special variable contains the text before the last match.
</p>

<pre class="explanation">
p $&amp;
</pre>

<p>
The $&amp; has the matched text. 
</p>

<pre class="explanation">
p $'
</pre>

<p>
And the $' variable contains the text after the last match. 
</p>

<pre>
$ ./svars.rb
4
"Her "
"name"
" is Jane"
</pre>

<p>
Output of the example. 
</p>


<h2>Anchors</h2>

<p>
Anchors match positions of characters inside a given text.
We will deal with three anchoring characters. The ^ character
matches the beginning of the line. The $ character matches the
end of the line. The \b character matches word boundaries. 
</p>

<pre class="code">
#!/usr/bin/ruby

sen1 = "Everywhere I look I see Jane"
sen2 = "Jane is the best thing that happened to me"

p sen1.match /^Jane/ 
p sen2.match /^Jane/ 

p sen1.match /Jane$/ 
p sen2.match /Jane$/ 
</pre>

<p>
In the first example, we work with the ^ and the $
anchoring characters. 
</p>

<pre class="explanation">
sen1 = "Everywhere I look I see Jane"
sen2 = "Jane is the best thing that happened to me"
</pre>

<p>
We have two sentences. The word 'Jane' is located at the
beginning of the first one and at the end of the second 
one. 
</p>

<pre class="explanation">
p sen1.match /^Jane/ 
p sen2.match /^Jane/ 
</pre>

<p>
Here we look if the word 'Jane' is at the beginning of the
two sentences.
</p>

<pre class="explanation">
p sen1.match /Jane$/ 
p sen2.match /Jane$/ 
</pre>

<p>
Here we look for a match of a text at the end of
the sentences. 
</p>

<pre>
$ ./anchors.rb
nil
#&lt;MatchData "Jane"&gt;
#&lt;MatchData "Jane"&gt;
nil
</pre>

<p>
These are the results. 
</p>

<hr class="btm">

<p>
A common request is to include only a match of a whole word. By
default we count any match, including a match in larger or compound
words. Let us look at an example to clarify things. 
</p>

<pre class="code">
#!/usr/bin/ruby

text = "The cat also known as the domestic cat is a small, 
usually furry, domesticated, carnivorous mammal."

p text.scan /cat/

p $`
p $&amp;
p $'
</pre>

<p>
We have a sentence. And within this sentence, we look for 
a string cat. With the help of the <code>scan</code> method,
we look for all 'cat' strings in the sentence. Not just
the first occurence. 
</p>

<pre class="explanation">
text = "The cat also known as the domestic cat is a small, 
usually furry, domesticated, carnivorous mammal."
</pre>

<p>
The problem is that inside the text there are three 'cat' strings.
In addition to the 'cat' referring to the mammal there is also
a match inside the word 'domesticated'. Which is not what we are
looking for in this case. 
</p>

<pre>
$ ./boudaries.rb
["cat", "cat", "cat"]
"The cat also known as the domestic cat is a small, \nusually furry, domesti"
"cat"
"ed, carnivorous mammal."
</pre>

<p>
Runnning the example shows three matches. And the special variables show
the text before and after the match in the 'domesticated' adjective. 
This is not what we want. In the next example, we will improve our 
regular expression to look only for whole word matches. 
</p>

<hr class="btm">

<p>
The <code>\b</code> character is used to set boundaries to the words
we are looking for. 
</p>

<pre class="code">
#!/usr/bin/ruby

text = "The cat also known as the domestic cat is a small, 
usually furry, domesticated, carnivorous mammal."

p text.scan /\bcat\b/

p $`
p $&amp;
p $'
</pre>

<p>
The example is improved by including the \b metacharacter. 
</p>

<pre class="explanation">
p text.scan /\bcat\b/
</pre>

<p>
With the above regular expression, we look for 'cat' strings
as whole words. We do not count subwords. 
</p>

<pre>
$ ./boudaries2.rb
["cat", "cat"]
"The cat also known as the domestic "
"cat"
" is a small, \nusually furry, domesticated, carnivorous mammal."
</pre>

<p>
This time there are two matches. And the special variables show 
correctly the text before and after the last match. 
</p>


<h2>Character classes</h2>

<p>
We can combine characters into character classes with the 
square brackets. A character class matches any character, that is specified in the brackets.
The /[ab]/ pattern means a or b, as opposed to /ab/ which means a followed by b.
</p>

<pre class="code">
#!/usr/bin/ruby

words = %w/ sit MIT fit fat lot pad /

pattern = /[fs]it/

words.each do |word|
   if word.match pattern
       puts "#{word} matches the pattern" 
   else
       puts "#{word} does not match the pattern" 
   end
end
</pre>

<p>
We have an array of six three letter words. We apply a
regular expression on the strings of the array with
a specific character set. 
</p>

<pre class="explanation">
pattern = /[fs]it/
</pre>

<p>
This is the pattern. The pattern looks for 'fit' and
'sit' strings in the array. We use either 'f' or 's'
from the character set. 
</p>

<pre>
$ ./classes.rb
sit matches the pattern
MIT does not match the pattern
fit matches the pattern
fat does not match the pattern
lot does not match the pattern
pad does not match the pattern
</pre>

<p>
There are two matches. 
</p>

<hr class="btm">

<p>
In the next example we will further explore the 
character classes.
</p>

<pre class="code">
#!/usr/bin/ruby

p "car".match %r{[abc][a][rs]}
p "car".match /[a-r]+/
p "23af 433a 4ga".scan /\b[a-f0-9]+\b/
</pre>

<p>
The example has three regular expressions with character classes.
</p>

<pre class="explanation">
p "car".match %r{[abc][a][rs]}
</pre>

<p>
The regular expression in this line consists of three character classes. 
Each is for one character. The [abc] is either a, b or c. The [a] is only
a. The third one, [rs], is either r or s. There is a match with the 'car'
string. 
</p>

<pre class="explanation">
p "car".match /[a-r]+/
</pre>

<p>
We can use a hyphen - character inside the character class. The hyphen
is a metacharacter denoting an inclusive range of characters. In our
case, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, r characters. Since
the character class applies only for one character, we also use the + repetition
character. This says that the previous character from the character set
may be repeated one or more times. The 'car' strings meets these conditions.
</p>

<pre class="explanation">
p "23af 433a 4ga".scan /\b[a-f0-9]+\b/
</pre>

<p>
In this line, we have a string consisting of three substrings. With the
<code>scan</code> method we check for hexadicimal numbers. 
We have two ranges. The first, [a-f] stands for characters
from a to f. The second one, [0-9] stands for numbers 0 to 9. 
The + specifies that these characters can be repeated multiple times.
Finally, the \b metacharacters create boundaries, which accept only strings that
consists of only these characters.
</p>

<pre>
$ ./classes2.rb
#&lt;MatchData "car"&gt;
#&lt;MatchData "car"&gt;
["23af", "433a"]
</pre>

<p>
Example output.
</p>


<hr class="btm">

<p>
If the first character of a character class is a caret (^) 
the class is inverted. It matches any character except 
those which are specified. 
</p>

<pre class="code">
#!/usr/bin/ruby

p "ABC".match /[^a-z]{3}/
p "abc".match /[^a-z]{3}/
</pre>

<p>
In the example, we use a caret character inside a character
class. 
</p>

<pre class="explanation">
p "ABC".match /[^a-z]{3}/
</pre>

<p>
We look for a string having 3 letters. These letters may
not be letters from a to z. The "ABC" string matches the regular
expression, because all three characters are uppercase characters. 
</p>

<pre class="explanation">
p "abc".match /[^a-z]{3}/
</pre>

<p>
This "abc" string does not match. All three characters are in the range,
that is excluded from the search.
</p>

<pre>
$ ./caret.rb
#&lt;MatchData "ABC"&gt;
nil
</pre>

<p>
Example output.
</p>


<h2>Quantifiers</h2>

<p>
A quantifier after a token or group specifies how often 
that preceding element is allowed to occur.
</p>

<pre>
 ?     - 0 or 1 match
 *     - 0 or more
 +     - 1 or more
 {n}   - exactly n
 {n,}  - n or more
 {,n}  - n or less (??)
 {n,m} - range n to m
</pre>

<p>
The above is a list of common quantifiers.
</p>

<pre class="code">
#!/usr/bin/ruby

p "seven dig moon car lot fire".scan /\w{3}/
p "seven dig moon car lot fire".scan /\b\w{3}\b/
</pre>

<p>
In the example, we want to select those words, that
have exactly three characters. The \w character is a word character,
and \w{3} means three times the prevoius word character.
</p>

<pre class="explanation">
p "seven dig moon car lot fire".scan /\w{3}/
</pre>

<p>
The first line simply cuts first three characters from each 
string. Which is not exactly what we want. 
</p>

<pre class="explanation">
p "seven dig moon car lot fire".scan /\b\w{3}\b/
</pre>

<p>
This is an improved search. We put the previous pattern between the \b
boundary metacharacter. Now the search will find only those words, that
have exactly three characters.
</p>

<pre>
$ ./nchars.rb
["sev", "dig", "moo", "car", "lot", "fir"]
["dig", "car", "lot"]
</pre>

<p>
Output of the example. 
</p>

<hr class="btm">

<p>
The {n,m} is a repetition structure for strings having from n to m
characters. 
</p>

<pre class="code">
#!/usr/bin/ruby

p "I dig moon lottery it fire".scan /\b\w{2,4}\b/
</pre>

<p>
In the above example we choose words that have two, three of four 
characters. We again use the boundary \b metacharacter to choose whole
words. 
</p>


<pre>
$ ./rchars.rb
["dig", "moon", "it", "fire"]
</pre>

<p>
The example prints an array of words having 2-4 characters. 
</p>

<hr class="btm">

<p>
In the next example, we will present the ? metacharacter. 
A character followed by a ? is optional. Formally, the character
preceding the ? may be present once or 0 times. 
</p>

<pre class="code">
#!/usr/bin/ruby

p "color colour colors colours".scan /colou?rs/
p "color colour colors colours".scan /colou?rs?/

p "color colour colors colours".scan /\bcolor\b|\bcolors\b|\bcolour\b|\bcolours\b/
</pre>

<p>
Say we have a text in which we want to look for the 
colour word. The word has two distinct spellings, english 'colour'
and american 'color'. We want to find both occurences, plus we 
want to find their plurals too. 
</p>

<pre class="explanation">
p "color colour colors colours".scan /colou?rs/
</pre>

<p>
The colou?rs pattern finds both 'colours' and 'colors'. The u character, which
precedes the ? metacharacter is optional. 
</p>

<pre class="explanation">
p "color colour colors colours".scan /colou?rs?/
</pre>

<p>
The colou?rs? pattern makes the u and s characters optional. And so we find
all four colour combinations. 
</p>

<pre class="explanation">
p "color colour colors colours".scan /\bcolor\b|\bcolors\b|\bcolour\b|\bcolours\b/
</pre>

<p>
The same request could be written using alternations. 
</p>

<pre>
$ ./qmark.rb
["colors", "colours"]
["color", "colour", "colors", "colours"]
["color", "colour", "colors", "colours"]
</pre>

<p>
Example output.
</p>

<hr class="btm">

<p>
In the last example of this section, we will show the
+ metacharacter. It allows the preceding character to 
be repeated 1 or more times. 
</p>

<pre class="code">
#!/usr/bin/ruby

nums = %w/ 234 1 23 53434 234532453464 23455636
    324f 34532452343452 343 2324 24221 34$34232/

nums.each do |num|
    m = num.match /[0-9]+/
    
    if m.to_s.eql? num
        puts num
    end              
end
</pre>

<p>
In the example, we have an array of numbers. Numbers can have one
or more number characters. 
</p>

<pre class="explanation">
nums = %w/ 234 1 23 53434 234532453464 23455636
    324f 34532452343452 343 2324 24221 34$34232/
</pre>

<p>
This is an array of strings. Two of them are not numbers, because
they contain non-numerical characters. They must be excluded. 
</p>

<pre class="explanation">
nums.each do |num|
    m = num.match /[0-9]+/
    
    if m.to_s.eql? num
        puts num
    end              
end
</pre>

<p>
We go through the array and apply the regular expression on each string. 
The expression is [0-9]+, which stands for any character from 0..9, repeated
0 or multiple times. Note that by default, the regular expression looks for substrings
as well. In the 34$34232 the engine considers 34 to be a number. 
The \b boundaries do not work here, because we do not have concrete characters and
the engine does not know, where to stop looking. This is why we have included an
if condition in the block. The string is considered a number only if the match is equal 
to the original string. 
</p>

<pre>
$ ./numbers.rb
234
1
23
53434
234532453464
23455636
34532452343452
343
2324
24221
</pre>

<p>
These values are numbers. 
</p>


<h2>Case insensitive search</h2>

<p>
We can perform a case insensitive search. A regular expression
can be followed by an option. It is a single character that
modifies the pattern in some way. In case of a case insensitive
search, we apply the i option.
</p>

<pre class="code">
#!/usr/bin/ruby

p "Jane".match /Jane/
p "Jane".match /jane/
p "Jane".match /JANE/

p "Jane".match /jane/i
p "Jane".match /Jane/i
p "Jane".match /JANE/i
</pre>

<p>
The example show both case sensitive and case insensitive search.
</p>

<pre class="explanation">
p "Jane".match /Jane/
p "Jane".match /jane/
p "Jane".match /JANE/
</pre>

<p>
In these three lines the characters must exactly match the
pattern. Only the first line gives a match. 
</p>

<pre class="explanation">
p "Jane".match /jane/i
p "Jane".match /Jane/i
p "Jane".match /JANE/i
</pre>

<p>
Here we use the i option, which followes the second / character. 
We do case insensitive search. All three lines do match. 
</p>

<pre>
$ ./icase.rb
#&lt;MatchData "Jane"&gt;
nil
nil
#&lt;MatchData "Jane"&gt;
#&lt;MatchData "Jane"&gt;
#&lt;MatchData "Jane"&gt;
</pre>

<p>
Output of the example. 
</p>


<h2>Alternation</h2>

<p>
The next example explains the alternation operator (|). This operator
enables to create a regular expression with several choices. 
</p>

<pre class="code">
#!/usr/bin/ruby

names = %w/Jane Thomas Robert Lucy Beky
    John Peter Andy/
    
pattern = /Jane|Beky|Robert/ 
    
names.each do |name|    
    
    if name =~ pattern
        puts "#{name} is my friend"
    else
        puts "#{name} is not my friend"
    end
end
</pre>

<p>
We have 8 names in the names array. We will look for a multiple
combination of strings in that array. 
</p>


<pre class="explanation">
pattern = /Jane|Beky|Robert/ 
</pre>

<p>
This is the search pattern. It says, Jane, Beky and Robert are my friends. 
If you find either of them, you have found my friend. 
</p>


<pre>
$ ./alternation.rb
Jane is my friend
Thomas is not my friend
Robert is my friend
Lucy is not my friend
Beky is my friend
John is not my friend
Peter is not my friend
Andy is not my friend
</pre>

<p>
Output of the script. 
</p>


<h2>Subpatterns</h2>

<p>
We can use square brackets () to create subpatterns inside patterns.
</p>

<pre class="code">
#!/usr/bin/ruby

p "bookworm" =~ /book(worm)?$/
p "book" =~ /book(worm)?$/
p "worm" =~ /book(worm)?$/
p "bookstore" =~ /book(worm)?$/
</pre>

<p>
We have the following regex pattern: <b>book(worm)?$</b>. The <b>(worm)</b> is
a subpattern. Only two strings can match. Either 'book' or 'bookworm'. 
The ? character follows the subpattern, which means, that the subpattern
might appear 0, 1 time in the final pattern. The $ character is here for the exact end match
of the string. Without it, words like bookstore, bookmania would match too.
</p>

<pre class="code">
#!/usr/bin/ruby

p "book" =~ /book(shelf|worm)?$/
p "bookshelf" =~ /book(shelf|worm)?$/
p "bookworm" =~ /book(shelf|worm)?$/
p "bookstore" =~ /book(shelf|worm)?$/
</pre>

<p>
Subpatterns are often used with alternation. We can create then multiple word combinations.
For example <b>(shelf|worm)</b> subpattern enables us to search for words 'bookshelf' and
'bookworm'. And with the ? metacharacter also for 'book'.
</p>

<pre>
$ ./subpatterns2.rb
0
0
0
nil
</pre>

<p>
Output. The last subpattern does not match. Remember, that the 0s do not mean
that there was no match. For the =~ operator, it is the index of the first character
of the matched string. 
</p>


<h2>Email example</h2>

<p>
In the final example, we create a regex pattern for checking email
addresses.
</p>


<pre class="code">
#!/usr/bin/ruby

emails = %w/ luke@gmail.com andy@yahoo.com 23214sdj^as
    f3444@gmail.com /
    
pattern = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$/

emails.each do |email| 

    if email.match pattern
        puts "#{email} matches"
    else
        puts "#{email} does not match"
    end
    
end
</pre>

<p>
Note that this example provides only one solution. It does not have to be the
best one. 
</p>

<pre class="explanation">
emails = %w/ luke@gmail.com andy@yahoocom 23214sdj^as
    f3444@gmail.com /
</pre>

<p>
This is an array of emails. Only two of them are valid.
</p>

<pre class="explanation">
pattern = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$/
</pre>

<p>
This is the pattern. The first ^ and the last $ characters are here to 
get an exact pattern match. No characters before and after the pattern are allowed.
The email is divided into five parts. 
The first part is the local part. This is usually a name of a company, individual or a nickname.
The <b>[a-zA-Z0-9._-]+</b> lists all possible characters, we can use in the local part. 
They can be used one or more times. 
The second part is the literal @ character. The third part is the domain part. It is usually
the domain name of the email provider. Like yahoo, gmail etc. <b>[a-zA-Z0-9-]+</b> It is a character set
providing all characters, than can be used in the domain name. The + quantifier makes use of one or more
of these characters. The fourth part is the dot character. It is preceded by the escape character. (\.)
This is because the dot character is a metacharacter and has a special meaning. By escaping it,
we get a literal dot. Final part is the top level domain. The pattern is as follows: <b>[a-zA-Z.]{2,5}</b>
Top level domains can have from 2 to 5 characters. Like sk, net, info, travel. There is also a dot character.
This is because some top level domains have two parts. For example, co.uk.
</p>

<pre>
$ ./email.rb
luke@gmail.com matches
andy@yahoocom does not match
23214sdj^as does not match
f3444@gmail.com matches
</pre>

<p>
The regular expression marked two strings as valid email adresses. 
</p>

<p>
In this chapter, we have covered regular expressions in Ruby.
</p>


<div class="botNav, center">
<span class="botNavItem"><a href="/">Home</a></span> ‡ <span class="botNavItem"><a href="..">Contents</a></span> ‡ 
<span class="botNavItem"><a href="#">Top of Page</a></span>
</div>

<div class="footer">
<div class="signature">
<a href="/">ZetCode</a> last modified December 15, 2011  <span class="copyright">&copy; 2007 - 2013 Jan Bodnar</span>
</div>
</div>

</div> <!-- content -->

</div> <!-- container -->

</body>
</html>
